feat: multi-worker uvicorn with shared rate-limit backend (#68)#71
Merged
Conversation
Brainstormed design for issue #68. Two new opt-in env vars (PC2NUTS_WORKERS, PC2NUTS_RATE_LIMIT_STORAGE_URI) drive multi-worker uvicorn behind a fail-degraded shared rate-limit backend; defaults preserve the current single-worker / in-memory deploy byte-for-byte. Key decisions captured in the spec: - Option (a) Redis-backed slowapi (over edge-layer or per-process division), preserving the strict 120/min anonymous cap while keeping trusted-token bypass working. - Fail-degraded behaviour (option III): on Redis unavailability, fall back to per-worker in-memory storage for a 30 s window before re-probing. Logs once per outage. - Hard-fail at startup if PC2NUTS_WORKERS > 1 with no storage URI configured, so the cap can never silently loosen. - uvicorn --workers (not gunicorn); shell-form CMD in Dockerfile to expand the env var. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
While prepping the implementation plan, found that slowapi 0.1.9 already ships exactly the fail-degraded behaviour we'd designed: Limiter(in_memory_fallback_enabled=True) routes to a per-process MemoryStorage when the primary raises, logs once per outage, and re-probes with exponential backoff (better than the fixed 30s window we'd specified). Custom _FailDegradedStorage class is no longer needed — drops a new module and four unit tests' worth of code we'd have to maintain. Spec sections 4.2, 4.3, 5, 6, 7, and 10 updated to reflect the library-feature approach. Architecture and operator-visible behaviour are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step-by-step TDD plan for issue #68 against the simplified spec at docs/superpowers/specs/2026-05-01-multi-worker-uvicorn-design.md. Seven tasks: Settings + validator, redis dep (ordered before the limiter test that exercises it), limiter module extraction, Dockerfile, README, CHANGELOG, final verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New Pydantic model validator hard-fails startup if PC2NUTS_WORKERS > 1 without PC2NUTS_RATE_LIMIT_STORAGE_URI configured, so the per-IP rate limit can never silently loosen under multi-worker. Defaults preserve current behaviour: workers=1, storage URI unset.
Pulls in redis-py at the version limits 5.8.0 expects, used only when PC2NUTS_RATE_LIMIT_STORAGE_URI is set. Single-host deployers who never configure shared storage pay the install-size cost but no runtime cost (redis is imported eagerly by limits.storage.RedisStorage at Limiter construction, but only when the storage URI is configured).
Without this, regenerating requirements.lock via the documented 'pip install -r requirements.txt && pip freeze > requirements.lock' flow would drop the redis pin added in dc56d8c. Using slowapi's [redis] extra (rather than pinning redis directly) keeps the declaration aligned with our actual dependency and lets slowapi's constraint chain choose the right transitive version of redis-py.
When PC2NUTS_RATE_LIMIT_STORAGE_URI is set, construct the Limiter with that storage URI and in_memory_fallback_enabled=True so transient backend outages fall back to per-process MemoryStorage. When unset, construction is byte-for-byte the previous inline call. slowapi's built-in fallback handles outage detection, once-per-outage WARNING logging, and exponential-backoff recovery probes.
One-line whitespace fix flagged by ruff format --check on Task 3's new test file. CI runs format --check, so this would have broken the lint job.
Switches CMD from exec-form to shell-form with 'exec uvicorn …' so
${PC2NUTS_WORKERS:-1} expands at container start while uvicorn remains
the foreground PID-1 process for proper SIGTERM handling.
Default of 1 preserves current single-worker behaviour. Multi-worker
mode also requires PC2NUTS_RATE_LIMIT_STORAGE_URI; the Settings
validator (added in feat(config)) refuses to start otherwise.
New 'Multi-worker deployment' subsection covers both env vars, the startup-validation guard for the unsafe combination, and the slowapi fail-degraded behaviour during a backend outage.
….txt (#68) Reviewer caught that slowapi 0.1.9's [redis] extra pins 'redis>=3.4.1,<4.0.0' — a stale 6-year-old constraint that contradicts the redis==7.4.0 lock pin. Any operator running the documented 'pip install -r requirements.txt && pip freeze > requirements.lock' regeneration flow would silently downgrade to redis 3.5.3. limits[redis] exposes the modern constraint 'redis!=4.5.2,!=4.5.3,<8.0.0,>3' which matches the lock pin and is what we actually want. The slowapi -> limits dependency chain still pulls slowapi in transitively, so we don't lose any functionality by dropping the slowapi[redis] extra in favour of limits[redis]. Verified: pip install --dry-run -r requirements.txt now resolves redis to 7.4.0 cleanly, matching requirements.lock.
bk86a
added a commit
that referenced
this pull request
May 1, 2026
PR #71 shipped multi-worker uvicorn behind a shared rate-limit backend. This commit captures the re-run of scripts/perf_test.sh against the post-#68 deployment so the open AC items on #68 ("memory headroom" and "verify approximately N× headroom on /lookup") have measured numbers rather than estimates. Headlines: - Realistic-corpus knee (Scenario B) moved from 30 → 35-38 RPS. Single-worker collapsed at 35 (p99 4.47 s); multi-worker absorbs 35 cleanly (p99 150 ms) and only saturates between 35 and 40. - Hot-key plateau (Scenario A, persistent connections) doubled-ish: ~30 → ~50 RPS, with p99 at saturation 2.5× lower. - Recommended operating point unchanged at 27 RPS — Scenario E (3-min sustained) still meets the p99 ≤ 200 ms SLO. The win is headroom (~10% → ~30-40%), not the operating point itself. The 1.6× rather than 2× scaling is consistent with shared-edge TLS termination and Pydantic GIL contention being part of the cap, not just per-worker compute. Documented in the methodology notes. Also adds a new "Rate-limit shared-storage verification" subsection: 130 anonymous requests against the published 120/minute cap from a single source IP yielded exactly 120 × 200 + 10 × 429 — conclusive evidence the Redis sidecar is reachable from both workers and the cap is enforced globally rather than per-worker (the failure mode the startup validator at app/config.py:42-50 exists to prevent). CHANGELOG entry under [Unreleased] summarises both the re-baseline and the perf_test.sh fix from the previous commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements #68: multi-worker uvicorn behind a shared rate-limit backend.
PC2NUTS_WORKERS(default1),PC2NUTS_RATE_LIMIT_STORAGE_URI(default unset).WORKERS > 1without a storage URI configured (Pydantic model validator) — prevents silent per-IP rate-limit cap loosening.in_memory_fallback_enabled=Truehandles transient backend outages with per-processMemoryStoragefallback and exponential-backoff re-probing — once-WARNING-per-outage, INFO on recovery.Limiteris constructed exactly as before.Spec:
docs/superpowers/specs/2026-05-01-multi-worker-uvicorn-design.mdPlan:
docs/superpowers/plans/2026-05-01-multi-worker-uvicorn.mdAcceptance criteria from #68
Dockerfileupdated to launch N workers viaPC2NUTS_WORKERS(shell-formCMDwithexec uvicorn ... --workers ${PC2NUTS_WORKERS:-1})README.md(new "Multi-worker deployment" subsection under## Configuration)docs/performance.mdupdated — operational follow-up post-deployTest plan
Automated (passing on this branch)
tests/test_config.py::TestWorkersValidatorproves the validator fires forWORKERS>1 + no URI(4 tests covering the truth table including empty-string aliasing)tests/test_limiter.py::TestLimiterStorageSelectionproves the storage URI is honoured and fallback flag is setPC2NUTS_WORKERS=2without storage URI fails immediately with the validator's error message naming both env varsPre-merge manual (operational)
PC2NUTS_WORKERS=2-4and a real Redis configured viaPC2NUTS_RATE_LIMIT_STORAGE_URI/healthreturns 200 from all workers>120requests in a minute from a single anonymous client; confirm the120/minutecap is observed across workers (i.e. shared Redis storage works)exempt_when=is_trusted_requestruns before the storage call)"Rate limit storage recovered"line within ~30 sPost-merge operational
scripts/perf_test.sh) and updatedocs/performance.mdwith the new RPS at the chosen worker countNotes
d0baaeb): the initialrequirements.txtdeclarationslowapi[redis]>=0.1.9,<1resolves toredis<4(slowapi 0.1.9's stale extra constraint), conflicting with theredis==7.4.0lock pin. Switched tolimits[redis]>=2.3which exposes the modernredis>3,<8constraint.in_memory_fallback_enabled=True); a custom_FailDegradedStoragewrapper class was dropped from the design — see commit7cf2d00.🤖 Generated with Claude Code